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ABSTRACT 

If computing system performance is degradable then, as 
recognized in a number of recent studies, system evaluation 
must deal simultaneously with aspects of both performance 
and reliability. One approach is the evaluation of a 
system’s "performabili ty" which, relative to a specified 
performance variable Y, generally requires solution of the 
probability distribution function of Y» Prior work on per- 
formability models and solution methods has focused on the 
case where Y is discrete; in this paper we consider 
continuous-valued variables of the type usually addressed in 
performance evaluation (e.g., average throughput rate, aver- 
age response time, etc.). The models used are similar to 
those employed in performance modeling ( i . e . , Markovian 
queueing models) but are extended so as to account for vari- 
ations in structure due to faults. In particular, we con- 
sider the modeling of a degradable buffer/multiprocessor 
system whose performance Y is the (normalized) average 
throughput rate realized during a bounded interval of time. 
To avoid known difficulties associated with exact transient 
solutions, we employ an approximate decomposition of the 
model, permitting certain submodels to be solved in equili- 
brium. These solutions are then incorporated in a model 
with fewer transient states and by solving the latter, we 
obtain a closed-form solution of the system's performabil- 
ity. In conclusion, some applications of this solution are 
discussed and illustrated, including an example of design 


I. INTRODUCTION 


In the evaluation of computing systems, issues of performance and 
reliability have traditionally been distinguished by regarding "per- 
formance" as "how well the system performs, provided it is correct" 
(see [l]-[3], for example) and regarding "reliability" as "the proba- 
bility of performing successfully" (see [4] -[7], for example). 
Although this distinction is meaningful for hardware and software 
architectures which exhibit "all or nothing" behavior in the presence 
of faults, it becomes blurred in the context of distributed, multi- 
function systems (computers, computer-communication networks, operat- 
ing systems, data bases, etc.) where performance is "degradable." As 
recognized in a number of recent studies [8]— [14] , the evaluation of 
degradable systems calls for unified performance-reliability measures 
which, in the terminology of (12], quantify a system's "performabi 1- 
ity." Such measures, in turn, call for appropriate generalizations of 
the types of analytic models and solution methods employed in perfor- 
mance and reliability evaluation. 

To accommodate these needs, a general modeling framework was 
introduced in [8] (and subsequently refined in [12]) wherein the "per- 
formance" of a system S over a specified time period T is represented 
by a random variable Yg taking values in a set A. Elements of A are 
the "accomplishment levels" (performance outcomes) to be distinguished 
in the evaluation process, e.g., in the special case of reliability 
evaluation, A = {success, failure}. At the other extreme, performance 
may range over a continuum of values, e.g., A is the real number 
interval (0,oo) where a level a S A is the "throughput rate of S aver- 
aged over T." With respect to a designated performance variable Yg, 
the "performability" of S is the probability measure induced by Yg 
where, for any measurable set B of accomplishment levels (B C A), 

Pg(B) = probability that S performs at a level in B. 
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(For more precise definitions of these concepts, the reader is 
referred to [12] . ) 


Performability evaluation thus entails a complete probabilistic 
description of the performance variable Y g , as opposed to partial 
information such as its expected value, its variance, etc. In general 
(assuming Yg is real-valued), this description is provided by the pro- 
bability distribution function (PDF) of Y g , i.e., the function F v 

1 S 

where Fy^(y) = Prob[Y s <_ y] . By the definition of performability, 


it follows that pg is uniquely determined by Fy ; in particular, if 

s 

By = (a S A|a < y} then p s (B y ) (the ability to perform within B y ) 

coincides with Fy (y) . To solve Fy , performability modeling calls 

s s 


for an appropriate representation of the total system S by a stochas- 
tic process X g (the "base model" of S) so that each state trajectory 
(sample path) of X g corresponds to a specific value of Y g . (In the 
terminology of [8], [12], this correspondence is referred to as the 
"capability function" of S.) Typically, the modeling process will 
also ( involve the introduction of intermediate models between X g and 
Y q , so as to facilitate the solution of F v . 


Prior work on the development of performability models and solu- 
tion methods has dealt primarily with discrete performance variables 
ranging over a countable and typically finite set of accomplishment 
levels. In the overall process of system design and validation, the 
use of these discrete variable methods is best suited to validation of 
a completed system design with respect to "bottom line" performability 
requirements. However, if the evaluation results disclose that a 
design is deficient, the performability data need not be indicative of 
just how the design should be modified. This is due to the fact that 
lower level, design-oriented details are often suppressed by a high 
level* discrete performance variable. Hence early validation (during 
the design process) at lower system and subsystem levels is required 
if negative results are to indicate how the design should be modified. 


I . 
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In the latter validation context, and more generally, in the con- 
text of "design aids," we believe that performabili ty models and solu- 
tions can likewise play an important role. Here, there is a need to 
consider more detailed aspects of system and subsystem behavior (e.g., 
speed, responsiveness, etc.) which, when modeled as performance vari- 
ables, can assume a continuum of values. Accordingly, the evaluation 
methods called for here must deal with continuous performance vari- 
ables as well as discrete performance variables. Moreover, to support 
the investigation of various design trade-offs, there is a need to 
develop methods which yield closed-form perf ormabil ity solutions, 
expressed as a function of the underlying model parameters. 

In the discussion that follows, we demonstrate that closed-form 
solutions of performability can indeed be obtained for a continuous 
performance variable. The system we consider consists of a degradable 
multiprocessor with an input buffer (queue) for the temporary storage 
of computational tasks that arrive randomly at the input. The perfor- 
mance in question is the fraction of incoming tasks processed during 
utilization or, equivalently, the normalized average throughput rate. 
In constructing the base model of this system (Section II), we extend 
the kind of Markovian queueing models that are currently employed to 
evaluate the performance of a (fault-free) computer (see [l]-[3], for 
example) . When so extended, these models are able to represent varia- 
tions in structure, due to faults, as well as variations in internal 
state and environment. In solving the performability (Section III), 
our strategy is to lump states of the base model so that, within a 
lump, the model exhibits a steady-state behavior (to a close approxi- 
mation). This permits decomposition of the solution into an equili- 
brium (steady-state) part and a transient part. The equilibrium part 
employs known solutions from queueing theory; the transient part is 
more difficult and calls for an approach which, to the best of our 
knowledge, is new. Here, through a hierarchical decomposition of the 
capability function and an appropriate partitioning of the accomplish- 
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merit set, we are able to obtain the desired solution. 

II. MODEL CONSTRUCTION 

The system we evaluate is a total system S = (C,E) where, infor- 
mally, computer C and environment E can be described as follows. C is 
a degradable multiprocessor system consisting of N identical proces- 
sors (N >1 2) and a buffer (queue) for temporary storage of incoming 
tasks (see Fig. 1). The buffe; B is assumed to have a finite capacity 
L (L > 0), that is, B is capable of storing at most L tasks. (Note 
that, by allowing L = 0, we are including the case where C actually 
has no buffer at all.) The environment E is the arrival of computa- 
tional tasks at the input to the computer. We assume here that tasks 
arrive randomly (one at a time) and that there is no upper bound on 
the total number of arrivals. More detailed descriptions of C and E 
will be given once we specify the performance variable in question. 

Performance Variable 

Regarding performance, we presume that, id 
the computer to process all tasks that arriv 
utilization period T. However, due to the fi 
buffer and to faults which may occur in the 
ideal behavior will generally not be attaina 
interesting measure of performance in this con 
task arrivals that C in fact processes during u 
this more precisely, if t S [0,oo), let A,* a 
variables: 

A fc =* number of tasks that arrive during [0, 

D fc = number of tasks that are processed dur 

Then, relative to the utilization period T = [0 
formance of S to be the random variable 
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Du 

Yg =t = fraction of arrived tasks processed during T. 

(Note that 0 < Y s < 1.) Alternatively, if we let 
A*. 

«t = = average arrival rate during [0,t] 

. Du 

o t = = average throughput rate of S during [Q,t] 


(3) 


(4) 

(5) 


then 


, _ D t _ D t/^* 
■S “ AT " Kryx 


n 

_ °t _ averaqe throuqhput rate of S durinq T 
" ccr s v e r'age" sf rival r $ re "d ur i'rtg t — 


( 6 ) 


In other words, the performance of S can also be interpreted as the 
"normalized average throughput rate," normalized with respect to the 
average arrival rate and averaged over the utilization period 
T * [0, t] - 

To solve the PDF of Y s and, hence, the performability of S, the 
specific nature of the computer C and environment E must be spelled 
out in more detail. We begin with the environment. 


Envi ronment Model 

If, as earlier (see (1)), we let A t denote the number of task 
arrivals during the interval [0,t], the environment E can be regarded 
as a stochastic process 

X E = {A t |t e [0 , oo ) } (7) 

where the variables A fc take values in the state set Q E = {0,1,2,...}. 
To designate the specific nature of X E , we suppose further that 
arrivals are "purely random" in the sense that interarrival times are 
independent random variables with identical exponential distributions. 
This is equivalent to saying that the arrival process X E is a Poisson 
process. Accordingly if we let 
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c( = average arrival rate (in the long run), (8) 

that is, c( = lim (see(4)), then X p is uniquely determined by c(. 

t -» oo ■ 0 

More precisely, in the terminology of Markov processes, X E is a spe- 
cial "pure birth" process where the transition rate between each pair 
of successive states is c(. 

Computer Model 

As depicted in Fig. 1, the fault-free structure of the computer 
is determined by values of two basic parameters: 

N = number of processors (N >_ 2) (9) 

L = storage capacity of the buffer (L >_ 0). (10) 

To describe how the system is altered by faults, we assume the follow- 
ing. If C is fault-free (i.e., resources B, P^ p 2,***' P N are 
fault-free) then all processors are active (no "stand-bys") and are 
able to process tasks concurrently. Each processor is self-testing 
and, in the presence of a single faulty processor, the system is able 
to recover (with a specified "coverage") to an (N-l ) -processor confi- 
guration. In this configuration, C behaves the same as a fault-free 
version of the system with N-l processors, provided (N-l) ^ 2. When 
only a single processor remains fault-free, fault recovery is no 
longer possible. The input buffer B is assumed to be nondegradable , 
i.e., it either performs correctly or fails. (In a more general exam- 
ple, the buffer could likewise be treated as a degradable resource.) 
Either failure to recover from a processor fault or failure of the 
buffer results in a total loss of processing capability (system 
failure) . 

Under the above assumptions, the relevant structural configura- 
tions of C can be represented by the state set } 

Q r = {0,1, — ,N} (11) 

t 

Closed-Form Solutions of Performabili ty II. MODEL CONSTRUCTION 

1 

\ 

v 

fe. ■ : - ■ • : = 1 ■" ' — *“ “ 


J 

3 

\ 


\ 

| 


I 


i 


7 




with the following interpretation: 


State 

Fault-free resources 

N 

Buffer and N processors 

N-l 

• 

Buffer and N-l processors 

* 

• 

1 

Buffer anci 1 processor 

0 

System failure . 


Modeling how structure varies (probabilistically) as a function of 
time thus reduces to a standard problem typically encountered in reli- 
ability modeling. In this regard, let us assume that resources fail 
(become faulty) at constant rates equal to their respective long run 
average failure rates. More specifically, for each of the processors, 
let 

- processor failure rate (12) 

and let c p denote the coverage referred to above, x.e., 

Cp = probability of recovering from a processor fault. (13) 

For the buffer B with capacity L, we assume that B is constructed from 
L "stages" (a stage can store a single task) where, for each stage, 


X b = stage failure rate. 


(14) 


Then, if stages fail independently and any stage failure results in a 
buffer failure, it follows that 


= buffer failure rate 
= L *b. 


(15) 
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The transition rate from structure slat*-: i is iust the accumu- 
lated failure rate of fault-free resources •lasoeiated with state i, 
that is, 


* i^P + >v B * i>vp + L^ b 


(16) 


The combined "coverage" in state i (when interpreted directly in 
terras of Pig. 2) is the probability of a transition to state i-1 given 
a transition from state i. In terms of resource faults, c^ is there- 
fore the probability that a transition from state i is caused by a 
processor fault and, in turn, C is able to recover from that fault via 
self-test and reconfiguration. As the latter is specified by the cov- 
erage parameter c p (13), it follows, via a simple conditional proba- 
bility argument, that 


■i 


TXi 


iX pC p 




1 + 


-Wb - 

i 


d") 


For each structure state i S Q R , we now proceed to construct a 
submodel of C that accounts for the internal state behavior of C when 
its structure is fixed at i. In case C is fault-free (structure state 
N), the system is presumed to behave as follows. Given that the 
buffer is empty and at least one processor is idle, processing of an 
incoming task is immediately undertaken by an idle processor. If all 
processors are busy, an incoming task is stored in buffer B, provided 
B is not "full" (i.e., the number of tasks stored in B is less than 
L); as soon as one of the processors becomes idle, it begins to pro- 
cess the task that was least recently stored in the buffer* Finally, 
if B is full when a task arrives, the task is rejected (lost) and 
hence not processed at all. Note that this last condition is the one 
which directly affects the performance Y s = D t /A fc ( see (3)) when C is 
fault-free, since D t < A fc if and only if tasks are lost during [0,t]. 
When only i processors are fault-free (structure state i, i > 1), the 
system behaves as described above if each occurrence of the word 
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"processor" is replaced by "fault-free processor." Upon failure of the 
system (structure sfcatv 0), processing ceases and any incoming task is 
rejected. 

On closer inspection and in queueing theoretic terms (see [163, 
[17] for example), structure state i (1 < i <, N) can be viewed as a 
queueing system with i servers (the fault-free processors), a finite 
queue of size L (the buffer), and a f i rst-come-f i rst-served queueing 
discipline (the task scheduling discipline). If, further, we assume 
that the processing times for each fault-free processor are indepen- 
dent and exponentially distributed with parameter 

U = average processing rate (in the long run), (18) 

then structure state i is an instance of an M/M/m/K queueing system 
where 

m = i, the number of servers 

K » i+L, the storage capacity of the (19) 

system (servers queue plus). 


(M/M denotes the fact that the interarrival times and service times 
are exponentially distributed.) 

With this identification, a submodel of C in structure state i 
follows immediately by taking the internal states to be the set 


Ql,i = { j 10 < j < i+L} 
where 

j = number of tasks in C, 


( 20 ) 



that is, the number of tasks being processed plus the number of tasks 
stored in the buffer. (Thus, at the extremes, j = 0 says that all 
fault-free processors are idle and buffer B is empty; if j = i + L, 
all fault-free processors are busy and B is full.) Letting Xj ^ denote 
the submodel in question, that is, the stochastic representation of an 
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M/M/i/i+L queue with state set Qx,ir it follows that X r j is the 
’’birth-death" Markov process given by the state-transition-rate 
diagram of Fig. 3. (Although Xj^ is consistent with our interpreta- 
tion even when i * 0, we will take Xj^q to be a degenerate Markov pro- 
cess with a single absorbing state j * 0.) The model parameters indi- 
cated in Fig. 3 are the task arrival rate c( (8) associated with the 
environment X E , the processing rate n (18) of each processor, and the 
capacity L (10) of the buffer. In other words, "births" correspond to 
tasks accepted by C and "deaths" to tasks completed by C. In particu- 
lar, the zero acceptance rate in state i + L reflects the fact that 
tasks are rejected when the buffer is full. Finally, if j £ i (in 
which case all fault-free processors are busy), the completion rate of 
in reflects the assumed (ideal) parallel processing capability of the 
multiprocessor. 

Composing the internal state submodels Xj ^ (Fig. 3) with the 
structure mode) X R (Fig. 2), C can be modeled as a single Markov pro- 
cess X c with state set 

Q c = {(i,j) |i e Q r , j S Q Ifi } 

where, from the definitions of Q R (11), and Qj ^ (20), a state 
q » (i,j) represents both the structure and internal state of C with 

i = utructural configuration of C, 
j = number of tasks in C. 

The state-transition-rate diagram of the composite model X c is shown 
in Fig. 4. For a structure state i such that 2 < i < N , the transi- 
tion to state (0,0) indicated at the far left of the diagram applies 
to each state (i,j) in the corresponding row of the diagram. 

The computer model X c together with the environment model X E (7) 
thus constitute the base model of the system S = (C,E). However, with 
respect to the performance variable Y s (see (3), (6)) we find that the 
relevant aspects of X E have been incorporated in X c , so that X c can 
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serve as the base model of S. Accordingly, we take the Markov process 
X c to be the base model X g and will subsequently refer to it by either 
name . 

As a base model, X c is similar in both its purpose and its 
appearance (Fig. 4) to the kind of "workload models" considered by Gay 
and Ketelsen [10]. One difference is that we make no assignment of 
"capacities" to the states of the model. Rather, the computational 
capacity of a given structural configuration is implied by certain 
transition rates, i.e., in structure state i, the maximum processing 
rate is ijj tasks/unit time. Also, in keeping with traditional usage 
(see [l]-[3]), we prefer to reserve the term "workload" for the exter- 
nal demands placed on the computer. A workload model is thus part of 
(and often coincides with) a model of the computer's environment, 
e.g., the arrival process X E (7) is the "workload model" for the exam- 
ple in question. The major difference, however, is that the systems 
considered in [10] are repairable, resulting in irreducible Markov 
models where all states are recurrent non-null. The model of Fig. 4 
the other hand, has transient (non-recurrent) states? indeed, all the 
states of X c are transient except for the absorbing state (0,0). This 
difference has a considerable impact on techniques that can be used to 
solve the model, as we discuss in the section that follows. 

III. MODEL SOLUTION 

As pointed out in the introductory remarks of Section I, solving 
a system's performability is tantamount to solving the probability 
distribution function (PDF) of the performance variable. To this end 
and to simplify notation for the system S in question, let Y denote Y g 
(as specified in (3) or (6)) and let Fy denote the PDF of Y, that is 

F y (y ) = Prob [Y < y] . (21) 

Then, ideally, we would like to solve Fy as an exact formulation of 
Fy(y) , expressed in terms of y, t (the duration of utilization), and 
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the parameters of the base model X s = X c (Fig. 4). The parameters 
involved, including those derived from basic parameters, are summar- 
ized in Table 1. Such a formulation, however, would require (among 
other things) an exact, time-dependent solution of the state probabil- 
ities of the base model. Although this is possible, in principle, it 
appears to be fraught with practical difficulties. Indeed, for even 
the simplest models of this sort, e.g., an M/M/1 queue, such a solu- 
tion is far from trivial (see [ 1 o ] , pp. 73-78). On the other hand, if 
we are willing to settle for a good approximate solution, many of 
these difficulties may be circumvented. 


Choosing the latter approach, we note first that, in a form lying 
between equations (3) and (6), the performance variable Y = Yg can be 
expressed as 



To further decompose this expression, for each structure state i 
(0 < i < N), let us define a random variable D^, that represents t. 
contribution of i to D t , that is, 

d£ = number of tasks processed 

in structure state i during [0,t]. 


Then, since no tasks are processed in structure state 0 (total 
failure) , it follows that 


D t = 


N , 
i Di * 
i=l 


(24) 


If, further, we introduce the random variables 

w£ = total time spent in 

structure state i during [0,t] 

and let 


(25) 
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5 t = D t /W t 


= average throughput rate in 
structure state i during [0,t]. 

then, from (24) and (26), we have 
D t - 


( 26 ) 


Substituting this in (22), we can express Y as a function of lower 
level random variables, viz. 


Y 


« «t w t 


(27) 


where, except for c( t , each variable relates exclusively to a particu- 
lar structure state. 

In view of this formulation, suppose now that the system is such 
that the utilization time and the average failure times of the 
resources are much larger than the average interarrival time of incom- 
ing tasks and the average processing time of a processor, i.e., 


t, 1/Xp, l/* b >> l/c(, l/jj. (28) 

This case, and cases similar to it, prevail in most computing system 
applications since the quantities on the left are usually multiples of 
hours while those on the right are typically fractions of seconds. 
For example, if t = 10 hours and l/c( = 1 second then = 36,000. 
Assuming (28) (as we do throughout the remainder of the discussion), 
from the formulation of X-[ (see (16), 

t, l/>s i >> l/c(, 1 /jj. 

and hence, with high probability, 

w£ >> 1/di, 1/b- 

In other words, the time spent in a structure state is likely to be 
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long compared to the intertransition times among the internal states 
of that structure (see Fig. 4). 

Therefore, to a good first approximation, the internal state 
behavior in structure state i can be viewed as the long run, equili- 
brium behavior of the process , i (Fig- 3). Moreover, these 
processes represent familiar systems ?n their own right where, letting 
S i denote the system modeled by recall (19) that 

= M/M/i/i+L queueing system . (29) 

Accordingly, if we let 

S 1 = average throughput rate 
of (in the long run) 

then, by the definition of (26), 

SI Z S 1 

and, since t>>l/c(, 

fJ 

0(4. = 1 1 rn ” c(. 

z s CO 5 

Taking these approximations to be identities and substituting in (27), 
we find that Y can be approximated by the expression: 

N S 1 wj: 

Y = 5 


If, further, we define 

Ri 

r i = Tf = norm ali ze( i average throughput 
rate of (in the long run) , 


we obtain the following convenient formulation of the performance 
variable Y (normalized average throughput rate during [ 0 , t ] ) : 
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N Wi 

Y = i rj J . ( 31 ) 

i=l 1 ^ 

Interpreting the terms of this formula, by definition (30) r i approxi- 
mates the normalized average throughput rate realized while the system 

Wi 

is in structure state i; by definition (25), is just the fraction 
of time the system actually spends in state i. Thus equation (31) is 
intuitively quite plausible. 

Mathematically, the performance variable Y is now expressed as a 
function of lower level variables r^, and w£ (1_< i £ N). By their 
definitions, each variable r^ can be solved in terms of the equili- 
brium behavior of its corresponding queueing model Xj As is well 
known (see [16], [17], for example), the equilibrium distribution of 
each r^ is deterministic (i.e., rj. assumes a constant value with pro- 
bability 1) whence Y reduces to a linear combination of the (depen- 
dent) variables w£, W^. Accordingly, the first step is to 

obtain closed-form solutions of the equilibrium rates r]_, r?2 r N* 


Equilibrium Solutions 


As defined above (29), the system may be viewed as an ideal, 
fault-free version of S when the number of processors N is equal to i. 
With this view and on comparing (30) with (6), it follows that r^ is 
just the long run performance of this ideal, i-processor system. 
Reverting to our original definition of the performance variable (3), 
we thus obtain an alternative interpretation of r^, namely 

r^ = fraction of arrived tasks 

processed by S^ (in the long run) . (32) 

Moreover, since the fraction of arrived tasks that remain in S^ 
becomes negligible in the long run (there are at most i+L tasks in 
S j ) , if we let 
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( 33 ) 


Sj_ 58 fraction of arrived tasks 

rejected by S^(in the long run), 

then 

r^ = 1 - Sj . (34) 

' The above is a convenient formulation of r^ since the quantity sj 
relates directly to the equilibrium state behavior of the queueing 
model X-^, (Fig. 3). Since Xj f ^ is ergodic, the time average s^ is 
equal to the probability, in equilibrium, that an arriving task is 
rejected, i.e., an arriving task finds the process in state i+L (full 
'queue). Stating this more precisely, if we let X = Xj j and K = i + L 
then 


s 4 = lim Prob[Xx. = K|task arrives at time t] . 

1 t-» oo u 

Due to the purely random nature of Poisson arrivals, it can be shown 
further (see [16], pp. 117-119) that the above coincides with the 
(unconditional) equilibrium probability p K of X being in state K, that 
is, 

Si = p K = lim Prob [X f = K] . (35) 

1 t -> oo 

Substituting back in (34), we have the pleasant (and somewhat intui- 
tive) conclusion that 

ri = i - p K = 1 - P i+L • (36) 

In other words, the normalized average throughput rate of Sj (in 
equilibrium) is just the equilibrium probability of the queue not 
being full. 

As S^ is an M/M/i/K queue, the general solution of p K (35) is 
known (see [17], Appendix C, Table 8, for example) and can be 
expressed as a function of i and the model parameters L = K - i , c( and 
/j (see Table 1). Moreover, the dependence on c( and jj is only through 
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their ratio 



(the so-called "traffic intensity"). By (36), these remarks obviously 
apply as well to the solution of r^ which, in a general form, can be 
expressed as follows: 



(38) 


More compact and more meaningful expressions are obtained for specific 
instances of i, e.g., for i=l,2 we have the following solutions: 


r 1 = 


i - u 
1 - u 


L+l 

trrz 


L+l 

trr? 


if u ^ 1 


if u=l 


r 2 


1 + 


2 (.£) L+2 


1 + 2 * 2 (?) 


2L + 3 
2 L ' - + ' 5 - 


u - rr:+3' 


if u / 2 


if u = 2 


(39) 


(40) 


Generally, it can be shown that, for fixed L, the normalized 
average throughput rate r^ is a monotonically decreasing function of u 
where, in the limit, r^-»0 as u->oo. On the other hand, for fixed u, 
r^ is a monotonically increasing function of the buffer capacity L (as 
one would expect since the larger the buffer, the less chance there is 
of losing a task). Accordingly, the limiting form of r^, as L-+oo, 
provides an upper bound (as a function u) on the value of r^. Taking 
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formal limits of (38) for various restrictions on the value of Jji, we 
have 


r< = 


if Jr < 1 


if £ > 1 


Thus, for a very large buffer, the normalized average throughput rate 
in structure state i is determined solely by the value of. (the 
"utilization factor"; see [16], for example). If the task arrival 
rate c( does not exceed the capacity i^/ then almost all tasks are pro- 
cessed; if c( > iji then r^ is approximately equal to the "normalized 
capacity" ^ = -i^-. Although we could examine the functional properties 
of r^ in greater detail, they are relatively well understood (by peo- 
ple familiar with queueing systems) and, for the purpose of the 
development that follows, the above observations should suffice. 

Solution of Performability 

Since the variables r^ (38) assume constant values for fixed 
values of the base model parameters, by (31) the performance variable 
Y can be expressed as a linear combination of lower level random vari- 
ables, viz. 

* - r 1 l 1 r i w t < 42 > 

where w£ (2b) is the total time spent in structure state i during 
[0,t]. Moreover, as the variables w£ depend only on the structure 
model X R (Fig. 2), X R can serve as the base model for the remaining 
part of the solution process. Accordingly, if equation (42) is 
extended to include state i = 0 where, trivially, Tq = 0 (see (30)), 
the equilibrium solutions r q r N ma y thought of as "yield 
rates" assigned to states 0, 1, ..., N respectively. In other words, 
the r^ constitute a "reward structure" (see [18]) for the Markov pro- 
cess X R . To the best of our knowledge, however, the analysis of 
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reward models has dealt exclusively with the solution of expected 
rewards, e.g., for the variable in question, the expected value E[Y] 
of Y. Performabili ty evaluation, on the other hand, requires a com- 
plete probabilistic description of Y, as provided by its PDF Fy (21), 

To clarify the approach we adopt in solving Fy, it is helpful to 
rephrase equation (42) in terms of a model hierarchy for X R (see [12], 
Def. 3). More specifically, if the sequence of variables 
(W^' WjP is identified with the level-0 model of a hierarchy 

(level-0 is closest to the "top") then (42) can be restated as the 
level-0 based capability function Yq, i.e., 

i U 

y 0 (Wi,W2,...,w N ) = ■^i^r^w^ (43) 

Where is the value of variable w£ (Wj. S [0,t]). If, further, we 
let By denote the set of accomplishment levels accounted for directly 
by the PDF Fy , i.e., 

B y = {a I a < y} 

then 

F y (y ) = Prob [Y e B y ] 

= Prob[K 0 (W^,W^,...,W^) 0 By] . (44) 

= Prob[(w£,w2,...,w£) 0 ^(By)]. 

In other words, if we could characterize the probabilistic nature of 
the level-0 -model , the desired solution could be obtained by formulat- 
ing the probability of the inverse image y^- (By) . 

At this level, however, we find that a probabilistic characteri- 
zation is difficult to obtain since the random variables 

1 r-\ k* 

w£, w£,..., are (statistically) dependent. This is due to the fact 
that the combined times spent in states 1, 2,..., N cannot exceed t. 
Thus, £or example, Prob(W^ -1 > 0|W^ = t] = 0 whereas 

ProbtW^ - " 1- > 0| 0 < Wj? < t] = c N (see (17)), thereby demonstrating the 
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dependence between and W^"* 1 . In general, whenever performance is 
defined with respect to a bounded utilization period, such dependen- 
cies are likely to exist among variables that are closely related to 
the performance variable. 

To circumvent this difficulty, a possible approach (which, in 
retrospect, appears to be the key to solving such problems) is to 
search for a lower level model which, at the expense of a more complex 
capability function, has a simpler probabilistic description. For the 
hierarchy in question, we obtain such a model by considering the times 
spent in structure states 1, 2,..., N over the entire unbounded inter- 
val [0,oo). More precisely, we take the level-1 model to be the 
sequence of variables (V-^ v 2,..., V N) where 


V, = lim Wi = time spent in 

•*> 4 . * 

state x during [0,oo). 


(45) 


Although this level-1 model is no less "abstract" than the level-0 
model, it should be clear that it contains more information, thereby 
admitting a well-defined "interlevel translation" (see [12]) 
level-1 to level-0. When this translation is composed with Yq, 
resulting capability function y^ (based on the level-1 model) 


formulated as follows. Let Vj^ denote the value of Vj 
and, for notational convenience, let Cj denote the sum 


(VjL 6 


f rom 
the 
can be 
[0, oo ) ) 


N 

= -.S.Vj 


1 < j < N. 


(46) 


Then it is relatively easy to verify that 


Closed-Form Solutions of Performability 


III. 


MODEL SOLUTION 




■M. 


r i S 

r i | 1 r i v i ' lf ^1 < t 

r j* v i + r j ' (47) 

if crj +1 _< t, crj > b 

^ r N • if cr N > t . 

At the cost of a more complicated capability function (compare 
(47) with (43)), we are now at a level where a probabilistic charac- 
terization is easier to obtain. This, in turn, can provide the solu- 
tion we seek, for arguing as we did at level-0 (see (44)), we have 

F y (y) = Prob[(V lf V 2f ...,V N ) 6 yj 1 (B y ) 1 . (48) 

To formulate these probabilities, we note first that, over the 
unbounded period (0,oo), a state trajectory (sample path) of X R (see 
Fig. 2) will (with probability 1) pass through a finite sequence of 
distinct states, beginning in some initial state i and terminating in 
the absorbing state 0. For each state i > 0 that is visited, the 
variable V ^ (45) is thus the time of a single "sojourn" is state i. 
Moreover, since X R is a Markov process, it is known (see [15], for 
example) that these sojourn times are exponentially distributed and 
are conditionally independent, given the sequence of states that are 
visited. 

With these observations, the solution of Fy can be conveniently 
decomposed by considering the conditional PDF of Y with respect to the 
random variable 


Y\ ( , v 2 r • • • »v N ) 


: i-j+l 


< r i " 
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U = sequence of states (excluding 0) ■ g I 

visited during [0 , oo ) . r t (49) 

More specifically, by the transition structure of X R , if a trajectory 
begins in state k where k > 0 and ends in state £ (prior to entering 
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state 0) then J? <. !< and 

U » (k,k-l, . . . ,£) . (50) 

If a trajectory begins in state 0 then no states (other than 0) are 
visited during [0,oo), in which case 

U = J\_ (the null sequence). (51) 

(Thus, for an N-processor system, there are + 1 possible values 

of U.) Accordingly, if we let u denote a value of U and let 

Fy|U = conditional PDF of Y given U=u (52) 

then, by a well known formula, Fy(y) may be expressed as 


Fy(y) = SFy jy (y I u) Prob [U=u] (53) 

where the sum is taken over all possible values of U, Moreover, for a 
given u, the terms Fy|y(ylu) and Prob [U = u] can be solved as fol- 
lows . 

Regarding Fy|g(y!u) and further simplifying notation, let the 
level-1 variables (45) be denoted by the single vector-valued vari- 
able 


V = (V 1 ,V 2 , . . . ,V N) 


taking values v = (V]_ , v 2 , . . . , v N ) , and let Cy denote the inverse 
of By under the capability function (47), i.e., 

c y = ) / I 1 ( B y) = tvl/ifv) < y). 


image 


(54) 


Then, in view of (48), when Y is conditioned by the event U = u, 


Fy |y (y|u) = Prob [V 0 C y |U = u] . 

This says, in turn, that Fy|y(y|u) can be solved by integrating the 
conditional joint probability density function (pdf) of V given U -• u, 
over the region Cy, i.e., if we let 
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fi V |(j * conditional joint pdf of V given U*u 

then 


(55) 


f yiu (yi u ) 


IH 


fvjutvlu) dv l dV 2 ••• dv 


N 


(56) 


Regarding (56), the formulation of £ v | U ( v ! u ) * s straightforward, 
due to the independence of the sojourn times corresponding to 
states in the sequence u. Given u, for each state i 6 u (meaning, 
with a slight abuse of notation, that i appears in the sequence u) , we 
know that is exponentially distributed with parameter (see (16) 
and Pig. 2); if i 0 u then, with probability 1, » 0. Consequently, 
by the independence of the V^, if u is nonnull then 


fy|U ( v|u ) 


I TT Le if s 0,for all j £ u 

1 iSu 1 J 


y 0 if Vj > 0, for some j ? u. 


(57) 


In case u - J\. (the null sequence) the formulation is trivial, i.e., 

1 1 if v 1 =V 2 =» • • =v N =0 

(58) 

0 otherwise. 

Performing the indicated integration, on the ot&er hand, is generally 
quite difficult, due to the nonlinear form of the capability function 
(see (47)). (Results obtained for N = 2 will be illustrated momen- 
tarily.) 

As for the second product term in equation (53), the solution of 
Prcb[U = u] is immediate by inspection of the transition-rate diagram 
of X R (Pig. 2). Given a state sequence u, u may be viewed a trajec- 
tory of the "imbedded" discrete-time Markov process X obtained by sam- 
pling X R each time it changes state. Moreover, by inspection of X R , 
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if 


i > 2, then Prob[X n+1 * i-l|X“ n *> 1] * 


(see 


(17)) 


and 


Prob[X n+:l » 0 |X n ■ i] * 1 - c^; if i - 1, Prob[x n+1 * Olx n * 1) * 1. 
Accordingly/ if we let { 1 0 < i < N} denote the initial state proba- 
bility distribution of X R , i.e./ 


Pi * Prob[X Rf0 s 1] 

then, for a nonnull sequence u * (k,k-l, . . . ,£) 

I Pk°k c k-l* • ,c £+l (1 ~ c g) if k > £ > 2 
Pk c k c k-1* ' * c 2 if k > £ = 1 

p k (l - c k ) if k - £ > 2 
Pi if k * £ = 1 . 


(59) 


(60) 


In case u is the null sequence, the corresponding trajectory oust ini- 
tially be in state 0? hence 


ProbCU ® A] * Po. 


(61) 


This completes the description of the solution procedure which, 
in summary, involves the following steps: 


1) For each structure state i, apply (38) to determine the equili- 
brium solution of the normalized average throughput rate in state 
i . 

2) For each state sequence u, apply (57), (58) to determine the con- 
ditional joint pdf of V given U - u. 

3) For each pdf obtained in 2), apply (56) to determine the PDF of Y 
given U = u. 

4) For each possible state sequence u, apply (60), (61) to determine 
the probability that U = u. 

5) Combining the results of 3) and 4), apply (53) to determine the 
PDF F y for the performance variable V. 

\ 


Dual-Processor Example 

To illustrate this procedure and, particularly, the kind of solu- 
tions it is capable of producing, let us consider the case of a buf- 
fered dual-processor (N =2). 
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NAME 


a 


task arrival rate 


DEFINITION 


( 8 ) 


a 

H 

Ui 

< 


N 


number of processors 


buffer capacity 


processor failure rate 


processor coverage 


processor processing rate 


buffer stage failure rate 


( 9 ) 


( 10 ) 


( 12 ) 


( 13 ) 


( 18 ) 


( 14 ) 


Q 

£ 

£ 

w 

Q 


l B 


buffer failure rate 


transition rate from structure 
state i 


coverage in structure state i 


( 15 ) 


( 16 ) 


( 17 ) 


Taiile 1. Base model parameters. 



erformance variable Y: F (y) = Prob[Y£y] 


Table 2. Closed-form solution of 




The equilibrium solutions r]_ and r 2 have already been considered 
and are given by equations ( 39 ) and ( 40 ). 

Step 2 ) 

When N = 2 , there are four state sequences to consider: 
U 1 = u 2 = ( 2 ) • u 3 " (l)r and u 4 - A* Interpreting these 

sequences, if a state trajectory (of X R ) has sequence u^ then the sys- 
tem is initially fault-free and, during [0,oo), recovers from a single 
processor fault before failing. If the sequence is u 2 , the system is 
initially fault-free but fails on the first occurrence of a processor 
or buffer fault. u 3 says that one processor is initially faulty; U4 
says that the system failed prior to utilization. Letting f^ denote 
the pdf of V given U = u^ and applying ( 57 ), ( 58 ) we have 

f 1 (v|u 1 ) * X : e ^ lVl X 2 e * 2V2 , 

( ^2 e ^ ^ if V| = 0, v 2 > 0 

f 2 ( v I u 2 ) = { 

| 0 if v^ > 0 , 

| Xi e if ,vi >.0, v 2 = 0 

f 3 (v|u 3 ) = 

[0 if v 2 > 0 , 

1 1 if Vj = V 2 a 0, 

0 otherwise. 

Step 3 .) 

To obtain the integrals ( 56 ) of these pdf's over the region 
C y = (vl/j^v) < y}, it is necessary to characterize Cy for various 
ranges of y so as to determine the specific limits of integration. 
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This is done by specializing Y\ (47) to the case in point (N = 2) and 
examining the boundary Z” 1 (y) that delimits the region C y . Thus, for 
example, if y is in the range r^ < y < r 2 , then C y is the region of 
the v^-v 2 plane depicted in Fig. 5. For convenience in stating the 
resulting solutions, lot F^ denote the PDF of Y given U = u^ (52) and 
let {/^ denote the quantity 

</ { - (62) 
1 1 i 

which, when r^ is fully expressed, is a function of base model parame- 
ters L, c(, and as well as and t. Then, for the instance where 
i = 1 and y is in the range r^ < y < r 2 , the integration according to 
(56) of f i over C y (see Fig. 5) yields the solution 

-(^-^i) r 2(y- r i) 

-^y b 

U'2-l'i) 

Solutions of F^(y) in other ranges of y and solutions of the other F^ 
are obtained in a like manner. 

Step £) 

By the definitions of u^ - u 4 and on applying (60), (61), we ! j.ve 

Prob[U = u x ] = P2 C 2 ' 

Prob[U = u 2 ] = p 2 (l “ c 2 ^ ' 

Prob[U = u 2 ] = pj , 

Prob[U * u 4 ] = p 0 = 1 - (Pi + p 2 ) • 


F. t\i\ = 1 - 


-Viv . ^2 V e 


. 
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Ste£ 5) 


Applying equation (53) to the results obtained in steps 3) and 
4), a closed-form solution of Fy, expressed in terms of y, r^ and (/± 
(see (62)), is displayed in Table 2. For the reader interested in 
seeing the solutions that were obtained as a result of Step 3), we 
note that they may may be resurrected by considering the following 
special cases of Fy: 

f F l(y) if P 2 = 1 ' c 2 = 1 


F y (y) 


I F 2 (y) 


f 3 (y) 


if P 2 = 1 ' c 2 
if Pi = 1 


0 


^ F 4 (y) if p 0 = 1 , 


Given this solution of Fy, we have thus obtained a closed-form 
solution of the performability p s for intervals of the form 
By = {a ! a < y} , i .e. , 


Ps ( B y ) = F y (y) • 


To get a clearer picture of what this solution looks like, Figs. 6 and 
7 display plots of p s (By) = Fy(y) as a function of y for various 
choices of t and the base model parameters. Fig. 6 considers the sys- 
tem where t = 10, u = ^ = 1.5, = 0.01, ^ = 0.001, c p = 0.99, 
P 2 = 0.9, pj = 0.09, and Pg = 0.01; the figure furnishes several plots 
showing how Fy(y) varies as L ranges from 1 to 25 in steps of 2. Fig. 
7 is similar to Fig. 6 except that p 2 = 1.0 while P]_ = Pg = 0.0. 

Finally, as an illustration of how such a closed-form solution 
can be used to examine design tradeoffs, the buffer capacity L is an 
example of a design choice which influences both performance and reli- 
ability in a compensating manner. Were performance the only issue, 
then L should be made as large as possible (subject to other practical 
constraints such as cost) since the larger the buffer, the higher the 
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normalized average throughput rate (see (40)). On the other hand, if 
reliability is the only issue, then no buffer at all (L = 0) is the 
best choice since it will minimize the probability of system failure. 
Realistically, however, both performance and reliability are issues 
and, when considered simultaneously, we find that the performability 
(relative to a specified set B) can be optimized by an appropriate 
choice of L. For example, suppose B = { y | y > 0.8}, i.e., the system S 
performs within B if the normalized average throughput rate is greater 
than 0.8. Then, for the parameter values of Fig. 6, the variation of 
p g (B) as a function of buffer capacity L is displayed in Fig. is. In 
particular, we see that the optimum buffer capacity is 5 for this 
choice of parameter values. 

This is but one example of how such a closed-form solution of 
performability might be applied. Indeed, for the solution in question 
(Table 2), we have only begun to investigate its implications. There- 
fore, we intend to continue our exploration of various properties of 
this solution. We also want to investigate how the modeling and solu- 
tion techniques discussed herein might be extended so as to apply to a 
more general class of systems. 



F-' 
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Multiprocessor 


Fig.l. Block diagram of C. 









Fig. 4. State-transiticn-rate diagram of 


t 


t(r 2 -y) 

(r 2 - r i) 

for y in the range r.,j<y<r 











Fig. 7. Plot of P s (B )=F Y (y) as a function of y for 

the choices of t and the base model parameters 
shown above. L varies from 1 to 25 in steps 





